Home:ALL Converter>Databrick csv cannot find local file

Databrick csv cannot find local file

Ask Time:2018-11-14T04:58:50         Author:mdivk

Json Formatter

In a program I have csv extracted from excel, I need to upload the csv to hdfs and save it as parquet format, doesn't matter with python version or spark version, no scala please.

Almost all discussions I came across are about databrick, however, it seems cannot find the file, here is the code and error:

df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema","true").option("delimiter",",").load("file:///home/rxie/csv_out/wamp.csv")

Error:

java.io.FileNotFoundException: File file:/home/rxie/csv_out/wamp.csv does not exist

The file path:

ls -la /home/rxie/csv_out/wamp.csv
-rw-r--r-- 1 rxie linuxusers 2896878 Nov 12 14:59 /home/rxie/csv_out/wamp.csv

Thank you.

Author:mdivk,eproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article:https://stackoverflow.com/questions/53289410/databrick-csv-cannot-find-local-file
mdivk :

I found the issue now!\n\nThe reason why it errors out of file not found is actually correct, because I was using Spark Context with setMaster(\"yarn-cluster\"), that means all worker nodes will look for the csv file, of course all worker nodes (except the one starting the program where the csv resides) do not have this file and hence error out. What I really should do is to use setMaster(\"local\").\n\nFIX:\n\nconf = SparkConf().setAppName('test').setMaster(\"local\")\nsc = SparkContext(conf=conf)\nsqlContext = SQLContext(sc)\ncsv = \"file:///home/rxie/csv_out/wamp.csv\"\ndf = sqlContext.read.format(\"com.databricks.spark.csv\").option(\"header\", \"true\").option(\"inferSchema\",\"true\").option(\"delimiter\",\",\").load(csv)\n",
2018-11-14T02:18:43
yy